Redesign live-to-final assistant replies by franksong2702 · Pull Request #3401 · nesquena/hermes-webui

franksong2702 · 2026-06-02T11:14:38Z

Thinking Path

Hermes WebUI's most important interaction surface is the running agent session: users need to understand live progress, tool activity, replay/recovery state, and where the final answer begins.
Prior fixes covered individual symptoms such as interim progress, tool cards, compression, replay, stale streams, and session switching, but the product model still needed one coherent live-to-final assistant reply lifecycle.
This PR implements the first slice of Redesign live-to-final assistant replies for running agent sessions #3400: visible progress is strengthened as a prompt contract, live-only compression state is shown while useful, settled/final content stops retaining compression status text, and stream ownership/reconnect paths avoid losing the active live reply.
During validation, duplicate same-session stream ownership and stale reconnect/replay behavior were blocking the UX from being reliable, so those are included as supporting fixes.

Refs #3400.
Refs #3014 and supersedes #3015.

What Changed

Strengthened the WebUI visible-progress prompt contract, absorbing the narrow Restore visible WebUI progress contract #3015 direction into this PR:
- long tool-running WebUI turns should not appear silent
- visible progress must be normal assistant content, not only hidden reasoning/tool output
- models are told not to run many independent tool batches back-to-back without visible assistant text
- regression coverage rejects the old optional you may provide wording
Adjusted Automatic Compression UX:
- live shows a centered non-interactive divider: Compressing context
- completion shows Context auto-compressed while the run continues
- settled/final Activity removes automatic-compression status text
- the divider typography is muted and non-bold so it reads as lifecycle chrome, not assistant content
Hardened live reattach and replay:
- active run-journal replay honors bounded cursor windows
- stale cursor-only INFLIGHT state is discarded before reattach
- explicit reconnect reopens stale CONNECTING EventSource instances
Fixed supporting stream ownership cases:
- chat start rechecks same-session stream ownership under the per-session lock
- duplicate starts for the same session reuse the current stream instead of creating a hidden ghost stream
Added regression coverage for visible progress prompt semantics, compression display, stale stream cleanup, and same-session inflight stream reuse.
Updated UI/UX docs, the run-state consistency RFC, DESIGN, and CHANGELOG for the live-only compression semantics.

Why It Matters

Running agent sessions are where users build trust in Hermes WebUI. The UI should make active work legible without confusing internal lifecycle state for final assistant content. This PR moves the experience closer to mature agent clients such as Codex and Claude Code: progress remains visible while work is happening, lifecycle detail is available when useful, and the final answer remains readable and distinct.

Contract Routing

Contract family: visible progress prompt contract, streaming/replay/run-state consistency, UI/UX assistant reply lifecycle, Automatic Compression display semantics.
Evidence used: docs/rfcs/webui-run-state-consistency-contract.md, docs/UIUX-GUIDE.md, DESIGN.md, focused frontend/static tests, run-journal replay tests, and manual 8788 live-session validation.
Contract change: visible interim progress for long tool-running WebUI turns is now firm prompt-contract language rather than optional guidance. Live-only Automatic Compression status is treated as transient running-session UI, not persistent settled transcript content. Final settled Activity keeps the Worklog, but removes automatic-compression status dividers.

Verification

node --check static/ui.js static/messages.js static/sessions.js static/workspace.js static/panels.js static/i18n.js
git diff --check origin/master
python3 scripts/ruff_lint.py --diff origin/master
- Result: no changed Python files vs origin/master
python -m pytest -q tests/test_sprint42.py tests/test_auto_compression_card.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_routes.py tests/test_run_journal_frontend_static.py
- Result: 108 passed, 1 warning
python -m pytest tests/ -q --timeout=60 --shard-id=0 --num-shards=3
- Result: 2391 passed, 6 skipped, 2 xpassed, 1 warning; one local failure in tests/test_profile_skills_stats.py::test_get_profile_skills_stats from the macOS platform fixture assumption, unrelated to this PR's diff
Manual 8788 validation:
- Spark and MiniMax-M3 were available in the isolated dev runtime
- live sessions triggered Auto Compression
- Auto Compression showed Compressing context and Context auto-compressed as centered live dividers
- automatic-compression dividers did not remain as final answer content
- tool/lifecycle chrome was visually quieter than assistant prose in dark and light skins

Screenshots

Live running state with prose, muted tool rows, and the centered compression divider:

Compression completion while the run continues:

Dark theme live state with prose, quiet tool row, token/timer footer:

Expanded quiet tool rows remain visually subordinate to assistant prose:

Final settled state keeps the folded L1 Worklog above assistant content:

Risks / Follow-ups

This PR absorbs the narrow prompt-contract slice from Restore visible WebUI progress contract #3015 because the live-to-final assistant reply design depends on models reliably emitting visible progress prose.
This PR intentionally keeps the implementation slice narrower than the whole Redesign live-to-final assistant replies for running agent sessions #3400 design space.
Follow-up areas intentionally left out of this PR:
- queue composer behavior during compression
- explicit degraded/rebuild status during slow reattach
- native SSE Last-Event-ID support
- max tool-call iteration / compression-exhausted terminal taxonomy refinements
- broader sidebar/session awareness improvements

Model Used

AI-assisted.

Provider: OpenAI / Codex
Model: GPT-5 Codex for implementation, debugging, merge preparation, and PR drafting
Additional validation model: GPT 5.3 Codex Spark was used in the local 8788 runtime to trigger running-session and Auto Compression scenarios

nesquena-hermes · 2026-06-03T20:07:08Z

This is a large slice (57 files, ~6k lines), so I focused on the one part with real runtime-behavior risk: the rewritten "missing final assistant reply" detection in api/streaming.py. I read the new helpers _session_lacks_final_assistant_answer (api/streaming.py:3497-3527) and _agent_result_terminal_failure (api/streaming.py:3529-3540), the call site (api/streaming.py:5690-5693), and diffed against the old guard on origin/master:api/streaming.py:5608-5609. The compression-exhausted classification (_classify_provider_error, new compression_exhausted branch) and the post-compression tool-result pruning (_prune_context_tool_results_after_compression) both look sound and defensive. One behavior change deserves a second look before merge.

The `_token_sent` guard was dropped, which can turn a successful tool-terminal turn into an error

On master the guard suppressed the silent-failure error whenever any text was streamed:

# origin/master:5608-5609
# _token_sent tracks whether on_token() was called (any streamed text)
if not _assistant_added and not _token_sent:

The PR replaces that with:

# HEAD:5690-5693
_terminal_failure = _agent_result_terminal_failure(result) or _session_lacks_final_assistant_answer(_all_result_messages)
if _terminal_failure:
    _assistant_added = False
if _terminal_failure or not _assistant_added:

_session_lacks_final_assistant_answer returns True whenever the transcript ends on a tool row (api/streaming.py:3505-3506):

role = msg.get('role')
if role == 'tool':
    return True

Crucially it makes this decision purely from message shape and ignores the result's success status — _agent_result_terminal_failure(result) is OR'd in, so even a result whose status is done is overridden to _assistant_added = False if the last persisted message is a tool result. The downstream branch then classifies with silent_failure=not bool(_err_str); with no error string this yields the "No response from provider" apperror. So a turn that streamed visible text and ended cleanly on a final tool batch with no closing assistant sentence now surfaces an inline error to the user, where master let it complete (because _token_sent was true).

That interaction is sharpened by this very PR: the strengthened progress contract (_WEBUI_PROGRESS_PROMPT, new lines telling the model to "say what you just confirmed and what you will check next before continuing with more tools") makes "emit prose, then run a final tool batch, then stop" a more likely shape, not less. A model that does exactly what the new prompt asks and whose last action is a successful tool call would get flagged.

Suggestion

Gate the shape-based check on the result not already reporting success, and/or restore the streamed-text escape hatch:

_terminal_failure = _agent_result_terminal_failure(result)
if not _terminal_failure and not _token_sent:
    _terminal_failure = _session_lacks_final_assistant_answer(_all_result_messages)

That keeps the genuine target case (agent fails mid-tool-run, nothing streamed, no final answer) firing, while not penalizing a turn that produced visible progress and merely ended on a tool result. If ending-on-tool is intentionally treated as failure even when text streamed, please say so in a comment near api/streaming.py:3505 and add a test_live_stream_ux case pinning "streamed text + tool-terminal + success status" to the chosen outcome — right now _session_lacks_final_assistant_answer's status-blindness isn't covered by an assertion that distinguishes it from _agent_result_terminal_failure.

Everything else I sampled in streaming.py reads cleanly. I didn't execute the suite (cron policy), but the diff's own claim of 108 passed on the focused files plus the message-wording changes (Compressing context / Compression finished) line up with the test_auto_compression_card.py updates. Worth splitting the non-streaming doc/CHANGELOG churn from the behavior change if you want a tighter review surface, but that's process, not correctness.

The provider/model reasoning-effort coercion (coerce_reasoning_effort_for_model, _filter_reasoning_efforts_for_provider, and their call-site wrappers) is unrelated to the live-to-final assistant reply experience and changes behavior for all reasoning-capable models. Reverting it here keeps PR nesquena#3401 focused on live stream / worklog / auto-compression / stream-ownership. The StreamChannel snapshot/event-id changes in config.py are part of the live-stream replay work and intentionally remain. The coercion ships separately so it gets a provider-capability-focused review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

greptile-apps · 2026-06-04T00:25:15Z

Greptile Summary

This PR implements the first slice of a cohesive live-to-final assistant reply lifecycle for Hermes WebUI (#3400): it hardens the visible-progress prompt contract, redesigns Automatic Compression UX (live divider → "Compressing context" / "Context auto-compressed" / removed from settled transcript), and fixes several stream ownership and replay bugs that were blocking a reliable live session experience.

Prompt contract: _WEBUI_PROGRESS_PROMPT now requires visible interim prose between tool batches; regression tests confirm the old optional wording is rejected.
Reconnect & replay hardening: StreamChannel gains subscribe_with_snapshot and note_last_event_id; the SSE handler replays the journal up to the subscriber's last_event_id cursor and then skips already-seen events from the offline buffer, resolving the duplicate-delivery issue flagged in earlier reviews.
Session-switch lifecycle: _start_chat_stream_for_session re-checks stream ownership inside the session lock; stale CONNECTING EventSource instances are now torn down on explicit reconnect; stale cursor-only INFLIGHT entries are discarded before re-attach; and loadSession propagates activityBurstAnchors, lastAssistantText, and related fields so multi-burst live turns survive session switches.

Confidence Score: 5/5

This PR is safe to merge. The reconnect dedup logic, stream ownership re-check under lock, and INFLIGHT lifecycle changes are all well-tested and the primary code paths are correct.

The three issues flagged in earlier reviews (undefined _completeAutomaticCompressionOnLiveProgress, reconnect dedup no-op, indentation) are all addressed. The new StreamChannel snapshot mechanism correctly pairs note_last_event_id with put_nowait to keep _last_event_id in sync, and the snapshot_cutoff_seq dedup guard prevents offline-buffer duplicates on reconnect. The _start_chat_stream_for_session while-loop re-check is bounded in practice. The remaining findings are style nits with no behavioral impact.

No files require special attention. The commented-out line in sessions.js and the minor _hashString allocation in messages.js are cosmetic.

Important Files Changed

Filename	Overview
api/config.py	StreamChannel gains subscribe_with_snapshot, note_last_event_id, and 3-tuple put_nowait; _last_event_id is now maintained atomically under _lock; well-structured and tested.
api/routes.py	Replay dedup logic (snapshot_cutoff_seq / replay_cutoff_seq) is correct for the common case; _start_chat_stream_for_session loop correctly re-checks ownership under lock.
api/streaming.py	Adds 3-tuple event queueing with note_last_event_id, post-compression tool-result pruning, and updated compression message strings; changes are focused and low-risk.
api/run_journal.py	Adds max_seq parameter to read_run_events for upper-bound filtering; minimal, correct change.
static/messages.js	_completeAutomaticCompressionOnLiveProgress now defined; activityBurstAnchor / segmentSeq tracking added to INFLIGHT; _hashString allocates String(value
static/sessions.js	loadSession INFLIGHT deletion for journalReplayFromStart/stale cursor entries is correct; _loadingSessionId guard fix allows same-session reload during pending switches; commented-out line left in the non-INFLIGHT active-stream branch.
static/ui.js	Large addition of worklog/live-run-status helpers; ensureLiveWorklogShell, showLiveRunStatus, _moveLiveRunStatusToTurnEnd all well-guarded with typeof checks; _stripVisibleAssistantEchoFromThinking semantics narrowed to exact-match only.
static/style.css	Large style rework for live-worklog, compression divider, run-status footer; muted typography for lifecycle chrome vs assistant prose.
tests/test_inflight_stream_reuse.py	New static-analysis tests covering same-stream reuse, CONNECTING transport rejection on reconnect, and same-session no-op guard fix.
tests/test_auto_compression_card.py	Tests for _completeAutomaticCompressionOnLiveProgress definition and per-event-listener call, and elapsed-timer no-op after PR changes.

Sequence Diagram

sequenceDiagram
    participant FE as Frontend (sessions.js / messages.js)
    participant SC as StreamChannel (config.py)
    participant RJ as RunJournal (run_journal.py)
    participant SSE as SSE Handler (routes.py)
    participant Worker as Streaming Worker (streaming.py)

    Note over FE,Worker: Live turn in progress
    Worker->>SC: note_last_event_id(event_id)
    Worker->>SC: put_nowait((event, data, event_id))
    SC->>SC: "_last_event_id = event_id"
    SC-->>FE: broadcast 3-tuple to active subscribers

    Note over FE,Worker: User switches session — SSE disconnects
    SC->>SC: Buffer items in _offline_buffer

    Note over FE,Worker: User switches back — reconnect
    FE->>SSE: "GET /api/chat/stream?replay=1&after_seq=N&after_event_id=X"
    SSE->>SC: subscribe_with_snapshot()
    SC-->>SSE: "(subscriber_queue, {last_event_id: Y})"
    SSE->>SSE: "snapshot_cutoff_seq = parse(Y)"
    SSE->>RJ: "_replay_run_journal(after_seq=N, max_seq=Y, include_stale=False)"
    RJ-->>SSE: journal events N+1 to Y
    SSE-->>FE: replay events via SSE
    SSE->>SSE: "replay_cutoff_seq = Y"

    loop Live stream tail
        SC-->>SSE: item from subscriber_queue
        SSE->>SSE: "event_seq = parse(item.event_id)"
        alt "event_seq <= replay_cutoff_seq"
            SSE->>SSE: skip duplicate
        else "event_seq > replay_cutoff_seq"
            SSE-->>FE: emit SSE event
        end
    end

_{Reviews (17): Last reviewed commit: "Close #3401 merge review test gaps" | Re-trigger Greptile}

franksong2702 · 2026-06-04T01:21:33Z

Pushed follow-up commits addressing review feedback. Summary of what changed on this branch:

207c09f9 — Fix false "no response" on streamed tool-terminal turns (addresses @nesquena-hermes's review)
The missing-final-assistant guard OR'd the status-blind _session_lacks_final_assistant_answer check in unconditionally, so a successful turn that streamed visible progress and ended on a final tool batch (or a leaked role=user control message) was reclassified as a terminal failure and surfaced a false error. The shape check is now gated behind not _token_sent, while _agent_result_terminal_failure(result) stays authoritative for explicit failure/partial/compression-exhausted status. Added behavioral unit tests for both helpers (tool-tail, user-tail, empty-messages, error-tail, success) and realigned the static guard to the corrected semantics.

4c48968b — Removed the reasoning-effort coercion from this PR. It's unrelated to the live-to-final reply experience and changes behavior for all reasoning-capable models, so it now ships as its own focused PR (#3505). The StreamChannel snapshot/event-id changes in api/config.py stay here because they're part of the reconnect-replay work.

b69bad5a — Addresses the Greptile review:

P1 (reconnect dedup was a no-op): correct catch — the worker keeps the queue 2-tuple (Stage-364) and propagates the journal event id via the STREAM_LAST_EVENT_ID side-channel, so StreamChannel._last_event_id was never populated, snapshot_cutoff_seq/replay_cutoff_seq never engaged, and offline-buffer events could be redelivered after journal replay. Rather than switch to a 3-tuple (which would break the Stage-364 design and its static tests), added StreamChannel.note_last_event_id() and call it from the worker put() and cancel_stream, so the cutoff now actually engages while the 2-tuple queue shape is preserved. Added a unit test for the snapshot wiring plus a static guard so put() can't silently regress to inert.
P2 (unbounded while True): the ownership-recheck loop is now capped at 3 attempts and returns 409 instead of spinning if stale cleanup keeps succeeding while another thread re-claims the session.
P2 (misleading indentation in loadSession): re-indented the INFLIGHT block; whitespace-only (git diff -w shows no non-whitespace change, node --check passes).

Local verification: tests/test_cancelled_turn_status.py, test_webui_runtime_diagnostics.py, test_stage364_opus_live_sse_event_id.py, test_run_journal_routes.py, test_stale_stream_cleanup.py, test_inflight_stream_reuse.py, test_run_journal_streaming_static.py, test_sprint42.py, test_regressions.py all pass locally; full matrix runs in CI.

Note: this branch will periodically re-conflict on CHANGELOG.md only (the [Unreleased] block) whenever a new release lands on master — no code conflicts.

🤖 Generated with Claude Code

franksong2702 · 2026-06-04T07:24:38Z

Updated with merge commit 53727cb to bring the branch onto latest master, resolve the CHANGELOG conflict, and remove the duplicate _active_stream_ids import noted in review. Local verification: git diff --check; pytest tests/test_cancelled_turn_status.py tests/test_webui_runtime_diagnostics.py tests/test_stage364_opus_live_sse_event_id.py tests/test_run_journal_routes.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_streaming_static.py tests/test_sprint42.py tests/test_regressions.py -q (179 passed).

franksong2702 · 2026-06-05T03:21:02Z

Conflict cleanup and CI fallout fixes pushed through 183e8258.

What changed:

Merged latest origin/master into franksong2702/live-to-final-assistant-replies and resolved conflicts in the live-to-final scope.
Restored merge-lost static UI invariants from current master:
- partial tool-call assistant rows remain visible/anchorable;
- thinking-only messages in simplified tool-calling mode render inline instead of inside a collapsed activity group;
- persistent-state tool_complete notifications see tc.is_error before toast classification;
- live compression card replacement restores the scroll snapshot before follow-settle.
No fix(runtime): make cancelStream() owner-aware and close its SSE source #3345 active Stop changes or unrelated mobile titlebar changes were folded into this PR.

Verification:

python -m pytest tests/test_issue401.py tests/test_issue3592_thinking_settlement.py -q -> 16 passed
python -m pytest tests/test_issue3340_persistent_state_toasts.py tests/test_issue3479_ios_stream_scroll_jump.py tests/test_issue401.py tests/test_issue3592_thinking_settlement.py -q -> 27 passed
python -m pytest tests/test_cancelled_turn_status.py tests/test_webui_runtime_diagnostics.py tests/test_stage364_opus_live_sse_event_id.py tests/test_run_journal_routes.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_streaming_static.py tests/test_sprint42.py tests/test_regressions.py tests/test_webui_gateway_chat_backend.py -q -> 200 passed
node --check static/messages.js static/ui.js static/sessions.js static/boot.js
git diff --check
GitHub Actions: 11/11 passed on 183e8258

AI assistance: Codex coordinated the merge cleanup with a sub-agent, reviewed the returned diff, fixed the CI-reported merge fallout, and reran focused verification before pushing.

franksong2702 · 2026-06-05T08:05:59Z

Conflict cleanup pushed in 19598a70.

What changed:

Merged latest origin/master (4c545a33) into franksong2702/live-to-final-assistant-replies.
Resolved conflicts in CHANGELOG.md and static/messages.js.
Kept this PR's live-to-final Unreleased notes while preserving current release history.
Preserved the current master terminal-event stale-stream bail-out in done while keeping this PR's immediate _streamFinalized behavior, so the prior merge-lost static UI invariants remain intact.

Verification:

git diff --check -> passed
node --check static/messages.js static/sessions.js static/ui.js static/boot.js -> passed
/Users/xuefusong/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_cancelled_turn_status.py tests/test_webui_runtime_diagnostics.py tests/test_stage364_opus_live_sse_event_id.py tests/test_run_journal_routes.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_streaming_static.py tests/test_cancel_stream_owner_guard.py tests/test_issue3587_intermediate_reasoning.py tests/test_session_events.py -q -> 132 passed
GitHub Actions on 19598a70 -> 11/11 passed
GitHub mergeability -> MERGEABLE / CLEAN

AI assistance: Codex performed the conflict cleanup and focused regression review.

franksong2702 · 2026-06-05T08:59:08Z

Conflict cleanup pushed in 72b838fd.

What changed:

Merged latest origin/master (f1211e1f, v0.51.267) into franksong2702/live-to-final-assistant-replies.
Resolved the CHANGELOG.md conflict by keeping this PR's live-to-final notes in Unreleased and preserving the new v0.51.267 security release notes.
api/routes.py auto-merged with the v0.51.267 security hardening changes; no manual behavior conflict was needed there.

Verification:

git diff --check origin/master..HEAD -> passed
node --check static/messages.js static/sessions.js static/ui.js static/boot.js -> passed
python -m pytest tests/test_cancelled_turn_status.py tests/test_webui_runtime_diagnostics.py tests/test_stage364_opus_live_sse_event_id.py tests/test_run_journal_routes.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_streaming_static.py tests/test_cancel_stream_owner_guard.py tests/test_issue3587_intermediate_reasoning.py tests/test_session_events.py tests/test_issue1909_csrf_token.py tests/test_issue2931_edge_tts_endpoint.py tests/test_sprint29.py -q -> 216 passed
GitHub Actions on 72b838fd -> 11/11 passed
GitHub mergeability -> MERGEABLE / CLEAN

AI assistance: Codex Autopilot performed the branch refresh, reviewed final scope, ran focused verification, pushed, and read back GitHub state.

franksong2702 · 2026-06-06T09:45:59Z

Updated the #3401 Thinking/Worklog blocker.

Product model:

Worklog remains the live-to-final record for an assistant turn.
Thinking is preserved as its own Worklog Thinking Card, sibling to process prose and Tool Cards.
Thinking is not promoted into final answer text and is not treated as a Tool Card.

Implementation:

Live Thinking Cards are now segment-scoped, so later reasoning does not keep updating the first Thinking Card in the turn.
Settled rendering keeps Thinking Cards in the folded Worklog.
Duplicate suppression is intentionally narrow: exact / normalized-exact only against visible process/final text from the same assistant turn.
Reasoning metadata is preserved even when the visible Thinking Card is suppressed.

Tradeoff:

Partial overlap and semantic dedupe are intentionally not handled in this PR. That avoids live content jumping and avoids model/provider-specific behavior.

Verification:

node --check static/messages.js static/ui.js
pytest tests/test_regressions.py tests/test_ui_tool_call_cleanup.py tests/test_issue2565_reasoning_accumulation.py tests/test_issue3592_thinking_settlement.py tests/test_issue_progress_echo_dedupe.py tests/test_issue2454_active_session_spinner.py
npx --yes eslint@10.4.0 --no-config-lookup -c eslint.runtime-guard.config.mjs "static/**/*.js"

…to-final-assistant-replies # Conflicts: # api/streaming.py # static/messages.js # static/ui.js # tests/test_issue765_streaming_persistence.py # tests/test_ui_tool_call_cleanup.py

franksong2702 · 2026-06-06T10:27:49Z

Follow-up after the latest refresh/re-gate:

Refreshed Redesign live-to-final assistant replies #3401 onto current origin/master and restored the reviewer-branch guardrails that were easy to lose during conflict resolution: idle attention-dot visibility/color, the dead settleLiveCompressionCards() removal, and stale test expectations around Fix compression-exhausted stream finalization #3316 terminal-failure semantics / Redesign live-to-final assistant replies #3401 Thinking replay.
The intended product model is unchanged: process prose, Thinking Card, and Tool Card are sibling items inside the Worklog. Thinking is not silently dropped and is not treated as a Tool Card or Final Answer.
Settled duplicate suppression is intentionally narrow: only exact / normalized-exact Thinking echoes are hidden from the folded Worklog. We keep the underlying reasoning metadata, preserve non-exact provider reasoning, and do not attempt partial-overlap, semantic, or model-specific dedupe. That avoids live-stream content jumping while still removing the obvious Spark/openai-codex exact echo in the settled view.
Latest head 002a57a6 is green on Browser smoke and the full Tests matrix, including lint and the Scope / undefined-reference gate.

…ds on reconnect, fixes #3707) (#3766) * fix(streaming): replay restored live tool cards on reconnect (#3763, fixes #3707) Post-#3401 (#3400 live-to-final epic) recovery residual. When a running session is restored from its in-memory live-turn snapshot and then reattached to the SSE stream, the restore-success path skipped replaying persisted live tool calls, leaving restored live text/thinking but an EMPTY Worklog until a later SSE event or the final render rebuilt the turn. - Extract the persisted-tool-card replay into replayPersistedLiveToolCards() (reads S.toolCalls or INFLIGHT[sid].toolCalls); run it on restoredLiveTurn && didReconnect, not only the !restoredLiveTurn fallback. - Dedup safety: restore-success replay passes {skipUnkeyedRestoredDuplicates:true} — when the restored snapshot already has .tool-card-row rows, an UNKEYED persisted tool is skipped to avoid a duplicate; keyed cards still replay and appendLiveToolCard's tid-dedup replaces the correct restored row. - appendLiveToolCard() and the new liveToolReplayId() both key on tid||id||tool_call_id||tool_use_id||call_id (consistent 5-alias set), so the dedup covers all known id shapes. - Both replay sites pass {sessionId, streamId} so the ownership guard applies. - Regression coverage: restore-success+reconnect replays tools; unkeyed-restored duplicates skipped; all-id-alias dedup; prior ordering invariants preserved. Correct post-#3401 fix for #3707 (supersedes the closed #3724). Co-authored-by: franksong2702 <[email protected]> * docs(changelog): stamp v0.51.309 — Release JY (stage-a5b #3763) --------- Co-authored-by: nesquena-hermes <[email protected]>

* docs(rfc): add Transparent Stream activity display mode RFC (#3820) Proposes Transparent Stream as an opt-in, chronological activity display mode alongside the default Compact Worklog (#3400/#3401). Captures the display-mode split agreed in #3820: each tool call as a first-class chronological event, interleaved with reasoning/progress, with compact previews, consistent across live, settled, and reload/replay paths. Documents the asymmetry in the existing `simplified_tool_calling` toggle (live-only, no settled/reload branch) and the three concrete integration points so the follow-up can be sliced safely. Doc-only; no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(rfc): refine Transparent Stream rollout scope --------- Co-authored-by: Frank Song <franksong2702@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

/#3876) (#3886) Fixes #3869: empty legacy three-dot thinking spinners piled up as stale rows after the agent finished thinking. The live-to-final redesign (#3401) made the thinking-card-row wrapper class unconditional, which broke finalizeThinkingCard()'s dots-only detection — it treated the wrapper class itself as a "has content" signal, so the dots-only removal branch went dead. Narrow hasContent to the actual .thinking-card element so dots-only spinners are removed on finalize while real Worklog Thinking Cards are preserved. Includes #3869 regression coverage (brace-walks finalizeThinkingCard, asserts the narrowed check + that real thinking cards are not removed). Co-authored-by: nesquena-hermes <[email protected]> Co-authored-by: franksong2702 <franksong2702@users.noreply.github.com>

Fixes #3875: chat transcript rendering as only a stack of date separators with no message bodies. The live-to-final/Worklog redesign (#3401) folds intermediate assistant segments into a collapsed Worklog and hides the source segment; when a turn's ONLY content is folded into a collapsed Worklog (empty final assistant message from an interrupted/autonomous run, or a reload where S.toolCalls did not hydrate so the Worklog has no expandable steps), every segment is hidden and the turn paints blank — leaving a bare column of date dividers. Adds a defensive fail-safe invariant at the end of renderMessages(): a settled assistant turn never renders with zero visible content. Blank turns get their folded Worklog expanded (or hidden segments un-hidden as a last resort). Turns with any visible answer are untouched, preserving the intended collapsed-Worklog UX. Reproduced + verified fixed in an isolated browser (clean Chrome profile to defeat the ?v= asset-cache); RED on master (blank 'Worklog' chip), GREEN with the fix (Worklog expanded, content visible). Includes #3875 structural regression coverage.

… + #3887 + #3831) (#3889) * Release v0.51.342 — Release LF (blank-transcript brick fix #3875) Fixes #3875: chat transcript rendering as only a stack of date separators with no message bodies. The live-to-final/Worklog redesign (#3401) folds intermediate assistant segments into a collapsed Worklog and hides the source segment; when a turn's ONLY content is folded into a collapsed Worklog (empty final assistant message from an interrupted/autonomous run, or a reload where S.toolCalls did not hydrate so the Worklog has no expandable steps), every segment is hidden and the turn paints blank — leaving a bare column of date dividers. Adds a defensive fail-safe invariant at the end of renderMessages(): a settled assistant turn never renders with zero visible content. Blank turns get their folded Worklog expanded (or hidden segments un-hidden as a last resort). Turns with any visible answer are untouched, preserving the intended collapsed-Worklog UX. Reproduced + verified fixed in an isolated browser (clean Chrome profile to defeat the ?v= asset-cache); RED on master (blank 'Worklog' chip), GREEN with the fix (Worklog expanded, content visible). Includes #3875 structural regression coverage. * docs(ui): clarify revealed-flag intent in #3875 fail-safe (greptile P2) Address greptile review on PR #3889: the 'revealed' flag means 'turn has a visible non-empty Worklog group' not 'we just expanded one'. An already-open non-empty group is itself visible, so the last-resort un-hide is correctly skipped. Comment-only; no behavior change. --------- Co-authored-by: nesquena-hermes <[email protected]>

franksong2702 marked this pull request as draft June 2, 2026 11:24

franksong2702 force-pushed the franksong2702/live-to-final-assistant-replies branch from d33dbce to a2bf57a Compare June 2, 2026 11:39

franksong2702 marked this pull request as ready for review June 2, 2026 11:47

franksong2702 marked this pull request as draft June 2, 2026 12:17

franksong2702 marked this pull request as ready for review June 2, 2026 12:49

This was referenced Jun 2, 2026

fix: reattach SSE on session-switch return and preserve live progress (closes #2924) #3005

Closed

Restore visible WebUI progress contract #3015

Closed

franksong2702 marked this pull request as draft June 2, 2026 15:00

franksong2702 force-pushed the franksong2702/live-to-final-assistant-replies branch 3 times, most recently from c96e741 to a000b20 Compare June 2, 2026 20:36

franksong2702 marked this pull request as ready for review June 2, 2026 20:43

This was referenced Jun 3, 2026

RFC: live-to-final replies for long-running sessions #3464

Merged

Redesign live-to-final assistant replies for running agent sessions #3400

Open

Fix early-cancel live stream race #3476

Closed

franksong2702 marked this pull request as draft June 3, 2026 10:09

franksong2702 force-pushed the franksong2702/live-to-final-assistant-replies branch from b62fd31 to 85830b7 Compare June 3, 2026 10:24

franksong2702 marked this pull request as ready for review June 3, 2026 10:43

franksong2702 mentioned this pull request Jun 4, 2026

Coerce reasoning effort to model/provider-supported levels #3505

Closed

greptile-apps Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread api/routes.py

Comment thread static/sessions.js Outdated

franksong2702 mentioned this pull request Jun 5, 2026

Fix compression-exhausted stream finalization #3316

Closed

franksong2702 pushed a commit to franksong2702/hermes-webui-fork that referenced this pull request Jun 5, 2026

Merge origin/master for PR nesquena#3401 conflict cleanup

72b838f

Frank Song added 2 commits June 6, 2026 18:07

Merge remote-tracking branch 'origin/master' into franksong2702/live-…

8566ebc

…to-final-assistant-replies # Conflicts: # api/streaming.py # static/messages.js # static/ui.js # tests/test_issue765_streaming_persistence.py # tests/test_ui_tool_call_cleanup.py

Close nesquena#3401 merge review test gaps

002a57a

nesquena-hermes mentioned this pull request Jun 6, 2026

[HELD — independent review pending] Release v0.51.294 — stage-3401 (live-to-final redesign #3401 + 4 deep-review fixes) #3741

Merged

nesquena-hermes closed this in e3a7c93 Jun 6, 2026

franksong2702 mentioned this pull request Jun 7, 2026

fix(streaming): replay restored live tool cards on reconnect #3763

Closed

nesquena-hermes mentioned this pull request Jun 7, 2026

Release v0.51.309 — Release JY (#3763 — replay restored live tool cards on reconnect, fixes #3707) #3766

Merged

This was referenced Jun 7, 2026

Preserve live stream output across session switches #3427

Open

feat(chat): render assistant turns as interleaved comment/action bubbles (#3397) #3568

Open

NagaResst mentioned this pull request Jun 8, 2026

Destructive tool calls hidden behind collapsed Activity card — user cannot intervene in time #3813

Open

ai-ag2026 mentioned this pull request Jun 8, 2026

Major UX regression: restore transparent chronological reasoning/tool-call stream in chat UI #3820

Open

franksong2702 mentioned this pull request Jun 9, 2026

docs(rfc): Transparent Stream activity display mode (#3820) #3862

Merged

nesquena-hermes mentioned this pull request Jun 9, 2026

Release v0.51.342 — Release LF (transcript + sidebar reliability: #3875 + #3887 + #3831) #3889

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redesign live-to-final assistant replies#3401

Redesign live-to-final assistant replies#3401
franksong2702 wants to merge 21 commits into
nesquena:masterfrom
franksong2702:franksong2702/live-to-final-assistant-replies

franksong2702 commented Jun 2, 2026 •

edited

Loading

Uh oh!

nesquena-hermes commented Jun 3, 2026

Uh oh!

greptile-apps Bot commented Jun 4, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

franksong2702 commented Jun 4, 2026

Uh oh!

franksong2702 commented Jun 4, 2026

Uh oh!

franksong2702 commented Jun 5, 2026

Uh oh!

franksong2702 commented Jun 5, 2026

Uh oh!

franksong2702 commented Jun 5, 2026

Uh oh!

franksong2702 commented Jun 6, 2026

Uh oh!

franksong2702 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

franksong2702 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thinking Path

What Changed

Why It Matters

Contract Routing

Verification

Screenshots

Risks / Follow-ups

Model Used

Uh oh!

nesquena-hermes commented Jun 3, 2026

The _token_sent guard was dropped, which can turn a successful tool-terminal turn into an error

Suggestion

Uh oh!

greptile-apps Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

franksong2702 commented Jun 4, 2026

Uh oh!

franksong2702 commented Jun 4, 2026

Uh oh!

franksong2702 commented Jun 5, 2026

Uh oh!

franksong2702 commented Jun 5, 2026

Uh oh!

franksong2702 commented Jun 5, 2026

Uh oh!

franksong2702 commented Jun 6, 2026

Uh oh!

franksong2702 commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

franksong2702 commented Jun 2, 2026 •

edited

Loading

The `_token_sent` guard was dropped, which can turn a successful tool-terminal turn into an error

greptile-apps Bot commented Jun 4, 2026 •

edited

Loading